AAAI.2019 - Doctoral Consortium

Total: 16

#1 A Theory of State Abstraction for Reinforcement Learning [PDF] [Copy] [Kimi]

Author: David Abel

Reinforcement learning presents a challenging problem: agents must generalize experiences, efficiently explore the world, and learn from feedback that is delayed and often sparse, all while making use of a limited computational budget. Abstraction is essential to all of these endeavors. Through abstraction, agents can form concise models of both their surroundings and behavior, supporting effective decision making in diverse and complex environments. To this end, the goal of my doctoral research is to characterize the role abstraction plays in reinforcement learning, with a focus on state abstraction. I offer three desiderata articulating what it means for a state abstraction to be useful, and introduce classes of state abstractions that provide a partial path toward satisfying these desiderata. Collectively, I develop theory for state abstractions that can 1) preserve near-optimal behavior, 2) be learned and computed efficiently, and 3) can lower the time or data needed to make effective decisions. I close by discussing extensions of these results to an information theoretic paradigm of abstraction, and an extension to hierarchical abstraction that enjoys the same desirable properties.

#2 Multi-Agent Coordination under Uncertain Communication [PDF] [Copy] [Kimi]

Author: Nikhil Bhargava

Multi-agent coordination is not a simple problem. While significant research has gone into computing plans efficiently and managing competing preferences, the execution of multiagent plans can still fail even when the plan space is small and agent goals are universally aligned. The reason for this difficulty is that in order to guarantee successful execution of a plan, effective multi-agent coordination requires communication to ensure that all actors have accurate beliefs about the state of the world. My thesis will focus on the problem of characterizing, modeling, and providing efficient algorithms for addressing planning and execution when there agents cannot maintain perfect communication.

#3 Adaptive Planning with Evidence Based Prediction for Improved Fluency in Routine Human-Robot Collaborative Tasks [PDF] [Copy] [Kimi]

Author: Christopher K. Fourie

This thesis work intends to explore the development of a shared mental model between an autonomous agent and a human, where we aim to promote fluency in continuing interactions defined by repetitive tasks. That is, with repetitive actions, experimentation and increasing iterations, we wish the robot to learn how its own behavior affects that of its partner. To accomplish this, we propose a model that encodes both human and robot actions in a probabilistic space describing the temporal transition points between activities. The purpose of such a model is not only in passive predictive power (understanding the future actions of an associate), but also to encode the latent effect of a robot’s action on the future actions of the associate.

#4 Expressive Real-Time Intersection Scheduling [PDF] [Copy] [Kimi]

Author: Rick Goldstein

Traffic congestion is a widespread annoyance throughout global metropolitan areas. It causes increases in travel time, increases in emissions, inefficient usage of gasoline, and driver frustration. Inefficient signal patterns at traffic lights are one major cause of such congestion. Intersection scheduling strategies that make real-time decisions to extend or end a green signal based on real-time traffic data offer one opportunity reduce congestion and its negative impacts. My research proposes Expressive Real-time Intersection Scheduling (ERIS). ERIS is a decentralized, schedule-driven control method which makes a decision every second based on current traffic conditions to reduce congestion.

#5 Reinforcement Learning for Improved Low Resource Dialogue Generation [PDF] [Copy] [Kimi]

Author: Ana V. González-Garduño

In this thesis, I focus on language independent methods of improving utterance understanding and response generation and attempt to tackle some of the issues surrounding current systems. The aim is to create a unified approach to dialogue generation inspired by developments in both goal oriented and open ended dialogue systems. The main contributions in this thesis are: 1) Introducing hybrid approaches to dialogue generation using retrieval and encoder-decoder architectures to produce fluent but precise utterances in dialogues, 2) Proposing supervised, semi-supervised and Reinforcement Learning methods for domain adaptation in goal oriented dialogue and 3) Introducing models that can adapt cross lingually.

#6 Counterfactual Reasoning in Observational Studies [PDF] [Copy] [Kimi]

Author: Negar Hassanpour

To identify the appropriate action to take, an intelligent agent must infer the causal effects of every possible action choices. A prominent example is precision medicine that attempts to identify which medical procedure will benefit each individual patient the most. This requires answering counterfactual questions such as: ""Would this patient have lived longer, had she received an alternative treatment?"". In my PhD, I attempt to explore ways to address the challenges associated with causal effect estimation; with a focus on devising methods that enhance performance according to the individual-based measures (as opposed to population-based measures).

#7 Using Automated Agents to Teach Negotiation [PDF] [Copy] [Kimi]

Author: Emmanuel Johnson

Negotiation is an integral part of our daily lives regardless of occupation. Although ubiquitous to our experience, we are never taught to negotiate. This lack of training presents many consequences from unfair salary negotiation to geopolitical ramification. The ability to resolve conflicts and negotiate is becoming more critical due to the rise of automated systems which look to replace various repetitive task jobs. In hopes of improving human negotiation skills, my work seeks to develop automated negotiation agents capable of providing personalized feedback. In this paper, I provide an overview of my past , current, and future work.

#8 Learning Generalized Temporal Abstractions across Both Action and Perception [PDF] [Copy] [Kimi]

Author: Khimya Khetarpal

Learning temporal abstractions which are partial solutions to a task and could be reused for other similar or even more complicated tasks is intuitively an ingredient which can help agents to plan, learn and reason efficiently at multiple resolutions of perceptions and time. Just like humans acquire skills and build on top of already existing skills to solve more complicated tasks, AI agents should be able to learn and develop skills continually, hierarchically and incrementally over time. In my research, I aim to answer the following question: How should an agent efficiently represent, learn and use knowledge of the world in continual tasks? My work builds on the options framework, but provides novel extensions driven by this question. We introduce the notion of interest functions. Analogous to temporally extended actions, we propose learning temporally extended perception. The key idea is to learn temporal abstractions unifying both action and perception.

#9 Multi-View Learning from Disparate Sources for Poverty Mapping [PDF] [Copy] [Kimi]

Author: Neeti Pokhriyal

Many data analytics problems involve data coming from multiple sources, sensors, modalities or feature spaces, that describe the object of interest in a unique way, and typically exhibit heterogeneous properties. The varied data sources are termed as views, and the task of learning from such multi-view data is known as multi-view learning. In my thesis, I target the problem of poverty prediction and mapping from multi-source data. Currently, poverty is estimated through intensive household surveys, which is costly and time consuming. The need is to timely and accurately predict poverty and map it to spatially fine-grained baseline data. The primary aim of my thesis is to develop novel multi-view algorithms that combine disparate data sources for poverty mapping. Another aim of my work is to relax the core assumptions faced by existing multi-view learning algorithms, and produce factorized subspaces.

#10 Numerical Optimization to AI, and Back [PDF] [Copy] [Kimi]

Author: Sathya N. Ravi

The impact of numerical optimization on modern data analysis has been quite significant. Today, these methods lie at the heart of most statistical machine learning applications in domains spanning genomics, finance and medicine. The expanding scope of these applications (and the complexity of the associated data) has continued to raise the expectations of various criteria associated with the underlying algorithms. Broadly speaking, my research work can be classified into two AI categories: Optimization in ML (Opt-ML) and Optimization in CV (Opt-CV).

#11 Adaptive Modeling for Risk-Aware Decision Making [PDF] [Copy] [Kimi]

Author: Sandhya Saisubramanian

This thesis aims to provide a foundation for risk-aware decision making. Decision making under uncertainty is a core capability of an autonomous agent. A cornerstone for with long-term autonomy and safety is risk-aware decision making. A risk-aware model fully accounts for a known set of risks in the environment, with respect to the problem under consideration, and the process of decision making using such a model is risk-aware decision making. Formulating risk-aware models is critical for robust reasoning under uncertainty, since the impact of using less accurate models may be catastrophic in extreme cases due to overly optimistic view of problems. I propose adaptive modeling, a framework that helps balance the trade-off between model simplicity and risk awareness, for different notions of risks, while remaining computationally tractable.

#12 Parameterized Heuristics for Incomplete Weighted CSPs [PDF] [Copy] [Kimi]

Author: Atena M. Tabakhi

The key assumption in Weighted Constraint Satisfaction Problems (WCSPs) is that all constraints are specified a priori. This assumption does not hold in some applications that involve users preferences. Incomplete WCSPs (IWCSPs) extend WCSPs by allowing some constraints to be partially specified. Unfortunately, existing IWCSP approaches either guarantee to return optimal solutions or not provide any quality guarantees on solutions found. To bridge the two extremes, we propose a number of parameterized heuristics that allow users to find boundedly-suboptimal solutions, where the error bound depends on user-defined parameters. These heuristics thus allow users to trade off solution quality for fewer elicited preferences and faster computation times.

#13 Imitation Learning from Observation [PDF] [Copy] [Kimi]

Author: Faraz Torabi

Humans and other animals have a natural ability to learn skills from observation, often simply from seeing the effects of these skills: without direct knowledge of the underlying actions being taken. For example, after observing an actor doing a jumping jack, a child can copy it despite not knowing anything about what's going on inside the actor's brain and nervous system. The main focus of this thesis is extending this ability to artificial autonomous agents, an endeavor recently referred to as "imitation learning from observation." Imitation learning from observation is especially relevant today due to the accessibility of many online videos that can be used as demonstrations for robots. Meanwhile, advances in deep learning have enabled us to solve increasingly complex control tasks mapping visual input to motor commands. This thesis contributes algorithms that learn control policies from state-only demonstration trajectories. Two types of algorithms are considered. The first type begins by recovering the missing action information from demonstrations and then leverages existing imitation learning algorithms on the full state-action trajectories. Our preliminary work has shown that learning an inverse dynamics model of the agent in a self-supervised fashion and then inferring the actions performed by the demonstrator enables sufficient action recovery for this purpose. The second type of algorithm uses model-free end-to-end learning. Our preliminary results indicate that iteratively optimizing a policy based on the closeness of the imitator's and expert's state transitions leads to a policy that closely mimics the demonstrator's trajectories.

#14 Verifiable and Interpretable Reinforcement Learning through Program Synthesis [PDF] [Copy] [Kimi]

Author: Abhinav Verma

We study the problem of generating interpretable and verifiable policies for Reinforcement Learning (RL). Unlike the popular Deep Reinforcement Learning (DRL) paradigm, in which the policy is represented by a neural network, the aim of this work is to find policies that can be represented in highlevel programming languages. Such programmatic policies have several benefits, including being more easily interpreted than neural networks, and being amenable to verification by scalable symbolic methods. The generation methods for programmatic policies also provide a mechanism for systematically using domain knowledge for guiding the policy search. The interpretability and verifiability of these policies provides the opportunity to deploy RL based solutions in safety critical environments. This thesis draws on, and extends, work from both the machine learning and formal methods communities.

#15 Stochastic Goal Recognition Design [PDF] [Copy] [Kimi]

Author: Christabel Wayllace

Given an environment and a set of allowed modifications, the task of goal recognition design (GRD) is to select a valid set of modifications that minimizes the maximal number of steps an agent can take before its goal is revealed to an observer. This document presents an extension of GRD to the stochastic domain: the Stochastic Goal Recognition Design (S-GRD). The GRD framework aims to consider: (1) Stochastic agent action outcomes; (2) Partial observability of agent states and actions; and (3) Suboptimal agents. In this abstract we present the progress made towards the final objective as well as a timeline of projected conclusion.

#16 Attention Guided Imitation Learning and Reinforcement Learning [PDF] [Copy] [Kimi]

Author: Ruohan Zhang

We propose a framework that uses learned human visual attention model to guide the learning process of an imitation learning or reinforcement learning agent. We have collected high-quality human action and eye-tracking data while playing Atari games in a carefully controlled experimental setting. We have shown that incorporating a learned human gaze model into deep imitation learning yields promising results.